18 research outputs found

    On Approaches to Discretisation of Stylometric Data and Conflict Resolution in Decision Making

    Get PDF
    23rd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems - early articleThe paper presents research on unsupervised and supervised discretisation of input data used in execution of stylometric tasks of authorship attribution. Basing on numeric characterisation of writing styles, recognition of authorship is performed by decision rules, as their transparent structure enhances understanding of discovered knowledge. The performance of rule classifiers, constructed in rough set approach, is studied in the context of a strategy employed for resolving conflicts. It is also contrasted with that of other selected inducers

    Discretisation of conditions in decision rules induced for continuous

    Get PDF
    Typically discretisation procedures are implemented as a part of initial pre-processing of data, before knowledge mining is employed. It means that conclusions and observations are based on reduced data, as usually by discretisation some information is discarded. The paper presents a different approach, with taking advantage of discretisation executed after data mining. In the described study firstly decision rules were induced from real-valued features. Secondly, data sets were discretised. Using categories found for attributes, in the third step conditions included in inferred rules were translated into discrete domain. The properties and performance of rule classifiers were tested in the domain of stylometric analysis of texts, where writing styles were defined through quantitative attributes of continuous nature. The performed experiments show that the proposed processing leads to sets of rules with significantly reduced sizes while maintaining quality of predictions, and allows to test many data discretisation methods at the acceptable computational costs

    Condition attributes, properties of decision rules, and discretisation : analysis of relations and dependencies

    Get PDF
    When mining of input data is focused on rule induction, knowledge, discovered in exploration of existing patterns, is stored in combinations of certain conditions on attributes included in rule premises, leading to specific decisions. Through their properties, such as lengths, supports, cardinalities of rule sets, inferred rules characterise relations detected among variables. The paper presents research dedicated to analysis of these dependencies, considered in the context of various discretisation methods applied to the input data from stylometric domain. For induction of decision rules from data, Classical Rough Set Approach was employed. Next, based on rule properties, several factors were proposed and evaluated, reflecting characteristics of available condition attributes. They allowed to observe how variables and rule sets changed depending on applied discretisation algorithms

    Heuristic-based feature selection for rough set approach

    Get PDF
    The paper presents the proposed research methodology, dedicated to the application of greedy heuristics as a way of gathering information about available features. Discovered knowledge, represented in the form of generated decision rules, was employed to support feature selection and reduction process for induction of decision rules with classical rough set approach. Observations were executed over input data sets discretised by several methods. Experimental results show that elimination of less relevant attributes through the proposed methodology led to inferring rule sets with reduced cardinalities, while maintaining rule quality necessary for satisfactory classification

    Reduct-based ranking of attributes

    Get PDF
    The paper is dedicated to the area of feature selection, in particular a notion of attribute rankings that allow to estimate importance of variables. In the research presented for ranking construction a new weighting factor was defined, based on relative reducts. A reduct constitutes an embedded mechanism of feature selection, specific to rough set theory. The proposed factor takes into account the number of reducts in which a given attribute exists, as well as lengths of reducts. Two approaches for reduct generation were employed and compared, with search executed by a genetic algorithm. To validate the usefulness of the reduct-based rankings in the process of feature reduction, for gradually decreasing subsets of attributes, selected through rankings, sets of decision rules were induced in classical rough set approach. The performance of all rule classifiers was evaluated, and experimental results showed that the proposed rankings led to at least the same, or even increased classification accuracy for reduced sets of features than in the case of operating on the entire set of condition attributes. The experiments were performed on datasets from stylometry domain, with treating authorship attribution as a classification task, and stylometric descriptors as characteristic features defining writing styles

    Assessing quality of decision reducts

    Get PDF
    The paper presents research focused on decision reducts, a feature reduction mechanism inherent to rough sets theory. As a reduct enables to protect the discriminative properties of attributes with respect to described concepts, from the point of data representation, a reduct length is considered to be the most important measure of its quality. However, such approach is insufficient while taking into account the performance of a reduct-based rule classifier applied to test samples. When many reducts of the same length are available, they can lead to vastly different predictions. The paper provides a description for the proposed procedure for iterative reduct generation, which results in decrease of diversity in the observed levels of accuracy, supporting reduct selection. The procedure was applied for binary classification with balanced classes, for the stylometric task of authorship attribution

    Polish statement on food allergy in children and adolescents

    Get PDF
    An adverse food reaction is defined as clinical symptoms occurring in children, adolescents or adults after ingestion of a food or chemical food additives. This reaction does not occur in healthy subjects. In certain individuals is a manifestation of the body hypersensitivity, i.e. qualitatively altered response to the consumed food. The disease symptoms observed after ingestion of the food can be triggered by two pathogenetic mechanisms; this allows adverse food reactions to be divided into allergic and non-allergic food hypersensitivity (food intolerance). Food allergy is defined as an abnormal immune response to ingested food (humoral, cellular or mixed). Non-immunological mechanisms (metabolic, pharmacological, microbiological or other) are responsible for clinical symptoms after food ingestion which occur in non-allergic hypersensitivity (food intolerance). Food allergy is considered a serious health problem in modern society. The prevalence of this disorder is varied and depends, among other factors, on the study population, its age, dietary habits, ethnic differences, and the degree of economic development of a given country. It is estimated that food allergy occurs most often among the youngest children (about 6-8% in infancy); the prevalence is lower among adolescents (approximately 3-4%) and adults (about 1-3%). The most common, age-dependent cause of hypersensitivity, expressed as sensitization or allergic disease (food allergy), are food allergens (trophoallergens). These are glycoproteins of animal or plant origine contained in: cow's milk, chicken egg, soybean, cereals, meat and fish, nuts, fruits, vegetables, molluscs, shellfish and other food products. Some of these allergens can cause cross-reactions, occurring as a result of concurrent hypersensitivity to food, inhaled or contact allergens. The development of an allergic process is a consequence of adverse health effects on the human body of different factors: genetic, environmental and supportive. In people predisposed (genetically) to atopy or allergy, the development of food allergy is determined by four allergic-immunological mechanisms, which were classified and described by Gell-Coombs. It is estimated that in approximately 48-50% of patients, allergic symptoms are caused only by type I reaction, the IgEmediated (immediate) mechanism. In the remaining patients, symptoms of food hypersensitivity are the result of other pathogenetic mechanisms, non-IgE mediated (delayed, late) or mixed (IgE mediated, non-IgE mediated). Clinical symptomatology of food allergy varies individually and depends on the type of food induced pathogenetic mechanism responsible for their occurrence. They relate to the organ or system in which the allergic reaction has occurred (the effector organ). Most commonly the symptoms involve many systems (gastrointestinal tract, skin, respiratory system, other organs), and approximately 10% of patients have isolated symptoms. The time of symptoms onset after eating the causative food is varied and determined by the pathogenetic mechanism of the allergic immune reaction (immediate, delayed or late symptoms). In the youngest patients, the main cause of food reactions is allergy to cow’s milk. In developmental age, the clinical picture of food allergy can change, as reflected in the so-called allergic march, which is the result of anatomical and functional maturation of the effector organs, affected by various harmful allergens (ingested, inhaled, contact allergens and allergic cross-reactions). The diagnosis of food allergy is a complex, long-term and time-consuming process, involving analysis of the allergic history (personal and in the family), a thorough evaluation of clinical signs, as well as correctly planned allergic and immune tests. The underlying cause of diagnostic difficulties in food allergy is the lack of a single universal laboratory test to identify both IgE-mediated and non-IgE mediated as well as mixed pathogenetic mechanisms of allergic reactions triggered by harmful food allergens. In food allergy diagnostics is only possible to identify an IgE-mediated allergic process (skin prick tests with food allergens, levels of specific IgE antibodies to food allergens). This allows one to confirm the diagnosis in patients whose symptoms are triggered in this pathogenetic mechanism (about 50% of patients). The method allowing one to conclude on the presence or absence of food hypersensitivity and its cause is a food challenge test (open, blinded, placebo-controlled). The occurrence of clinical symptoms after the administration of food allergen confirms the cause of food allergy (positive test) whereas the time elapsing between the triggering dose ingestion and the occurrence of clinical symptoms indicate the pathogenetic mechanisms of food allergy (immediate, delayed, late). The mainstay of causal treatment is temporary removal of harmful food from the patient’s diet, with the introduction of substitute ingredients with the nutritional value equivalent to the eliminated food. The duration of dietary treatment should be determined individually, and the measures of the effectiveness of the therapeutic elimination diet should include the absence or relief of allergic symptoms as well as normal physical and psychomotor development of the treated child. A variant alternative for dietary treatment of food allergy is specific induction of food tolerance by intended contact of the patient with the native or thermally processed harmful allergen (oral immunotherapy). This method has been used in the treatment of IgE-mediated allergy (to cow's milk protein, egg protein, peanut allergens). The obtained effect of tolerance is usually temporary. In order to avoid unnecessary prolongation of treatment in a child treated with an elimination diet, it is recommended to perform a food challenge test at least once a year. This test allows one to assess the body's current ability to acquire immune or clinical tolerance. A negative result of the test makes it possible to return to a normal diet, whereas a positive test is an indication for continued dietary treatment (persistent food allergy). Approximately 80% of children diagnosed with food allergy in infancy "grow out" of the disease before the age of 4-5 years. In children with non-IgE mediated food allergy the acquisition of food tolerance is faster and occurs in a higher percentage of treated patients compared to children with IgE-mediated food allergy. Pharmacological treatment is a necessary adjunct to dietary treatment in food allergy. It is used to control the rapidly increasing allergic symptoms (temporarily) or to achieve remission and to prevent relapses (long-term treatment). Preventive measures (primary prevention of allergies) are recommended for children born in a "high risk" group for the disease. These are comprehensive measures aimed at preventing sensitization of the body (an appropriate way of feeding the child, avoiding exposure to some allergens and adverse environmental factors). First of all, the infants should be breast-fed during the first 4-6 months of life, and solid foods (non milk products, including those containing gluten) should be introduced no earlier than 4 months of age, but no later than 6 months of age. An elimination diet is not recommended for pregnant women (prevention of intrauterine sensitization of the fetus and unborn child). The merits of introducing an elimination diet in mothers of exclusively breast-fed infants, when the child responds with allergic symptoms to the specific diet of the mother, are disputable. Secondary prevention focuses on preventing the recurrence of already diagnosed allergic disease; tertiary prevention is the fight against organ disability resulting from the chronicity and recurrences of an allergic disease process. Food allergy can adversely affect the physical development and the psycho-emotional condition of a sick child, and significantly interfere with his social contacts with peers. A long-term disease process, recurrence of clinical symptoms, and difficult course of elimination diet therapy are factors that impair the quality of life of a sick child and his family. The economic costs generated by food allergies affect both the patient's family budget (in the household), and the overall financial resources allocated to health care (at the state level). The adverse socio-economic effects of food allergy can be reduced by educational activities in the patient’s environment and dissemination of knowledge about the disease in the society

    Pruning Decision Rules by Reduct-Based Weighting and Ranking of Features

    No full text
    Methods and techniques of feature selection support expert domain knowledge in the search for attributes, which are the most important for a task. These approaches can also be used in the process of closer tailoring of the obtained solutions when dimensionality reduction is aimed not only at variables but also at learners. The paper reports on research where attribute rankings were employed to filter induced decision rules. The rankings were constructed through the proposed weighting factor based on the concept of decision reducts—a feature reduction mechanism embedded in the rough set theory. Classical rough sets operate only in discrete input space by indiscernibility relation. Replacing it with dominance enables processing real-valued data. Decision reducts were found for both numeric and discrete attributes, transformed by selected discretisation approaches. The calculated ranking scores were used to control the selection of decision rules. The performance of the resulting rule classifiers was observed for the entire range of rejected variables, for decision rules with conditions on continuous values, discretised conditions, and also inferred from discrete data. The predictive powers were analysed and compared to detect existing trends. The experiments show that for all variants of the rule sets, not only was dimensionality reduction possible, but also predictions were improved, which validated the proposed methodology

    Data irregularities in discretisation of test sets used for evaluation of classification systems: A case study on authorship attribution

    Get PDF
    When patterns to be recognised are described by features of continuous type, discretisation becomes either an optional or necessary step in the initial data pre-processing stage. Characteristics of data, distribution of data points in the input space, can significantly influence the process of transformation from real-valued into nominal attributes, and the resulting performance of classification systems employing them. If data include several separate sets, their discretisation becomes more complex, as varying numbers of intervals and different ranges can be constructed for the same variables. The paper presents research on irregularities in data distribution, observed in the context of discretisation processes. Selected discretisation methods were used and their effect on the performance of decision algorithms, induced in classical rough set approach, was investigated. The studied input space was defined by measurable style-markers, which, exploited as characteristic features, facilitate treating a task of stylometric authorship attribution as classification
    corecore